Overview
Brought to you by YData
Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 72458 |
| Missing cells | 32260 |
| Missing cells (%) | 3.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 7.8 MiB |
| Average record size in memory | 113.0 B |
Variable types
| Numeric | 7 |
|---|---|
| Text | 2 |
| Categorical | 3 |
| Boolean | 3 |
is_employed is highly imbalanced (71.7%) | Imbalance |
health_ins is highly imbalanced (54.6%) | Imbalance |
is_employed has 25515 (35.2%) missing values | Missing |
housing_type has 1686 (2.3%) missing values | Missing |
num_vehicles has 1686 (2.3%) missing values | Missing |
gas_usage has 1686 (2.3%) missing values | Missing |
recent_move_b has 1687 (2.3%) missing values | Missing |
Unnamed: 0 has unique values | Unique |
custid has unique values | Unique |
income has 6691 (9.2%) zeros | Zeros |
num_vehicles has 4636 (6.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-10-18 11:09:01.675022 |
|---|---|
| Analysis finished | 2024-10-18 11:09:12.721890 |
| Duration | 11.05 seconds |
| Software version | ydata-profiling vv4.11.0 |
| Download configuration | config.json |
Variables
Unnamed: 0
Real number (ℝ)
Unique 
| Distinct | 72458 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49910.638 |
| Minimum | 7 |
|---|---|
| Maximum | 100000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 5122.7 |
| Q1 | 24911.25 |
| median | 49838 |
| Q3 | 74786.75 |
| 95-th percentile | 95058.3 |
| Maximum | 100000 |
| Range | 99993 |
| Interquartile range (IQR) | 49875.5 |
Descriptive statistics
| Standard deviation | 28772.083 |
|---|---|
| Coefficient of variation (CV) | 0.57647195 |
| Kurtosis | -1.1937678 |
| Mean | 49910.638 |
| Median Absolute Deviation (MAD) | 24938 |
| Skewness | 0.0066722823 |
| Sum | 3.616425 × 109 |
| Variance | 8.2783274 × 108 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 7 | 1 | < 0.1% |
| 66447 | 1 | < 0.1% |
| 66456 | 1 | < 0.1% |
| 66455 | 1 | < 0.1% |
| 66454 | 1 | < 0.1% |
| 66453 | 1 | < 0.1% |
| 66450 | 1 | < 0.1% |
| 66449 | 1 | < 0.1% |
| 66446 | 1 | < 0.1% |
| 66458 | 1 | < 0.1% |
| Other values (72448) | 72448 |
| Value | Count | Frequency (%) |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 20 | 1 | |
| 21 | 1 |
| Value | Count | Frequency (%) |
| 100000 | 1 | |
| 99999 | 1 | |
| 99998 | 1 | |
| 99997 | 1 | |
| 99996 | 1 | |
| 99995 | 1 | |
| 99994 | 1 | |
| 99993 | 1 | |
| 99991 | 1 | |
| 99990 | 1 |
custid
Text
Unique 
| Distinct | 72458 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 566.2 KiB |
Length
| Max length | 12 |
|---|---|
| Median length | 12 |
| Mean length | 12 |
| Min length | 12 |
Unique
| Unique | 72458 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 000006646_03 |
|---|---|
| 2nd row | 000007827_01 |
| 3rd row | 000008359_04 |
| 4th row | 000008529_01 |
| 5th row | 000008744_02 |
| Value | Count | Frequency (%) |
| 000006646_03 | 1 | < 0.1% |
| 000026926_01 | 1 | < 0.1% |
| 000015018_01 | 1 | < 0.1% |
| 000017314_02 | 1 | < 0.1% |
| 000054759_02 | 1 | < 0.1% |
| 000017383_04 | 1 | < 0.1% |
| 000019351_01 | 1 | < 0.1% |
| 000019351_02 | 1 | < 0.1% |
| 000028817_02 | 1 | < 0.1% |
| 000008744_02 | 1 | < 0.1% |
| Other values (72448) | 72448 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 311165 | |
| 1 | 109450 | 12.6% |
| _ | 72458 | 8.3% |
| 2 | 69823 | 8.0% |
| 3 | 51814 | 6.0% |
| 4 | 47832 | 5.5% |
| 5 | 43051 | 5.0% |
| 6 | 41249 | 4.7% |
| 7 | 40992 | 4.7% |
| 9 | 40896 | 4.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 869496 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 311165 | |
| 1 | 109450 | 12.6% |
| _ | 72458 | 8.3% |
| 2 | 69823 | 8.0% |
| 3 | 51814 | 6.0% |
| 4 | 47832 | 5.5% |
| 5 | 43051 | 5.0% |
| 6 | 41249 | 4.7% |
| 7 | 40992 | 4.7% |
| 9 | 40896 | 4.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 869496 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 311165 | |
| 1 | 109450 | 12.6% |
| _ | 72458 | 8.3% |
| 2 | 69823 | 8.0% |
| 3 | 51814 | 6.0% |
| 4 | 47832 | 5.5% |
| 5 | 43051 | 5.0% |
| 6 | 41249 | 4.7% |
| 7 | 40992 | 4.7% |
| 9 | 40896 | 4.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 869496 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 311165 | |
| 1 | 109450 | 12.6% |
| _ | 72458 | 8.3% |
| 2 | 69823 | 8.0% |
| 3 | 51814 | 6.0% |
| 4 | 47832 | 5.5% |
| 5 | 43051 | 5.0% |
| 6 | 41249 | 4.7% |
| 7 | 40992 | 4.7% |
| 9 | 40896 | 4.7% |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.0340059 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Female |
| 3rd row | Female |
| 4th row | Female |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Female | 37461 | |
| Male | 34997 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| female | 37461 | |
| male | 34997 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 109919 | |
| a | 72458 | |
| l | 72458 | |
| F | 37461 | 10.3% |
| m | 37461 | 10.3% |
| M | 34997 | 9.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 364754 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 109919 | |
| a | 72458 | |
| l | 72458 | |
| F | 37461 | 10.3% |
| m | 37461 | 10.3% |
| M | 34997 | 9.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 364754 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 109919 | |
| a | 72458 | |
| l | 72458 | |
| F | 37461 | 10.3% |
| m | 37461 | 10.3% |
| M | 34997 | 9.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 364754 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 109919 | |
| a | 72458 | |
| l | 72458 | |
| F | 37461 | 10.3% |
| m | 37461 | 10.3% |
| M | 34997 | 9.6% |
is_employed
Boolean
Imbalance  Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 25515 |
| Missing (%) | 35.2% |
| Memory size | 566.2 KiB |
| True | |
|---|---|
| False | 2313 |
| (Missing) |
| Value | Count | Frequency (%) |
| True | 44630 | |
| False | 2313 | 3.2% |
| (Missing) | 25515 |
income
Real number (ℝ)
Zeros 
| Distinct | 4445 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41881.435 |
| Minimum | -6900 |
|---|---|
| Maximum | 1257000 |
| Zeros | 6691 |
| Zeros (%) | 9.2% |
| Negative | 45 |
| Negative (%) | 0.1% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | -6900 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 10700 |
| median | 26400 |
| Q3 | 52000 |
| 95-th percentile | 125000 |
| Maximum | 1257000 |
| Range | 1263900 |
| Interquartile range (IQR) | 41300 |
Descriptive statistics
| Standard deviation | 58274.605 |
|---|---|
| Coefficient of variation (CV) | 1.3914185 |
| Kurtosis | 37.944025 |
| Mean | 41881.435 |
| Median Absolute Deviation (MAD) | 18400 |
| Skewness | 4.87276 |
| Sum | 3.034645 × 109 |
| Variance | 3.3959296 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 6691 | 9.2% |
| 30000 | 1650 | 2.3% |
| 20000 | 1394 | 1.9% |
| 40000 | 1390 | 1.9% |
| 50000 | 1357 | 1.9% |
| 12000 | 1126 | 1.6% |
| 25000 | 1094 | 1.5% |
| 60000 | 1053 | 1.5% |
| 35000 | 939 | 1.3% |
| 15000 | 840 | 1.2% |
| Other values (4435) | 54924 |
| Value | Count | Frequency (%) |
| -6900 | 1 | < 0.1% |
| -6800 | 2 | < 0.1% |
| -6700 | 2 | < 0.1% |
| -6600 | 1 | < 0.1% |
| -6100 | 1 | < 0.1% |
| -6000 | 5 | |
| -5900 | 1 | < 0.1% |
| -5800 | 1 | < 0.1% |
| -5700 | 1 | < 0.1% |
| -5500 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 1257000 | 1 | < 0.1% |
| 1051000 | 2 | |
| 997000 | 1 | < 0.1% |
| 897100 | 1 | < 0.1% |
| 868200 | 1 | < 0.1% |
| 861000 | 1 | < 0.1% |
| 859000 | 1 | < 0.1% |
| 812000 | 3 | |
| 787000 | 1 | < 0.1% |
| 766000 | 1 | < 0.1% |
marital_status
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 566.2 KiB |
| Married | |
|---|---|
| Never married | |
| Divorced/Separated | |
| Widowed |
Length
| Max length | 18 |
|---|---|
| Median length | 7 |
| Mean length | 10.188219 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Never married |
|---|---|
| 2nd row | Divorced/Separated |
| 3rd row | Never married |
| 4th row | Widowed |
| 5th row | Divorced/Separated |
Common Values
| Value | Count | Frequency (%) |
| Married | 38040 | |
| Never married | 19120 | |
| Divorced/Separated | 10572 | 14.6% |
| Widowed | 4726 | 6.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| married | 57160 | |
| never | 19120 | 20.9% |
| divorced/separated | 10572 | 11.5% |
| widowed | 4726 | 5.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 154584 | |
| e | 131842 | |
| d | 87756 | |
| a | 78304 | |
| i | 72458 | |
| M | 38040 | 5.2% |
| v | 29692 | 4.0% |
| 19120 | 2.6% | |
| m | 19120 | 2.6% |
| N | 19120 | 2.6% |
| Other values (9) | 88182 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 738218 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| r | 154584 | |
| e | 131842 | |
| d | 87756 | |
| a | 78304 | |
| i | 72458 | |
| M | 38040 | 5.2% |
| v | 29692 | 4.0% |
| 19120 | 2.6% | |
| m | 19120 | 2.6% |
| N | 19120 | 2.6% |
| Other values (9) | 88182 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 738218 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| r | 154584 | |
| e | 131842 | |
| d | 87756 | |
| a | 78304 | |
| i | 72458 | |
| M | 38040 | 5.2% |
| v | 29692 | 4.0% |
| 19120 | 2.6% | |
| m | 19120 | 2.6% |
| N | 19120 | 2.6% |
| Other values (9) | 88182 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 738218 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| r | 154584 | |
| e | 131842 | |
| d | 87756 | |
| a | 78304 | |
| i | 72458 | |
| M | 38040 | 5.2% |
| v | 29692 | 4.0% |
| 19120 | 2.6% | |
| m | 19120 | 2.6% |
| N | 19120 | 2.6% |
| Other values (9) | 88182 |
health_ins
Boolean
Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 70.9 KiB |
| True | |
|---|---|
| False |
| Value | Count | Frequency (%) |
| True | 65553 | |
| False | 6905 | 9.5% |
housing_type
Categorical
Missing 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1686 |
| Missing (%) | 2.3% |
| Memory size | 566.2 KiB |
| Homeowner with mortgage/loan | |
|---|---|
| Rented | |
| Homeowner free and clear | |
| Occupied with no rent | 1120 |
Length
| Max length | 28 |
|---|---|
| Median length | 24 |
| Mean length | 20.125586 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Homeowner free and clear |
|---|---|
| 2nd row | Rented |
| 3rd row | Homeowner with mortgage/loan |
| 4th row | Homeowner free and clear |
| 5th row | Rented |
Common Values
| Value | Count | Frequency (%) |
| Homeowner with mortgage/loan | 31092 | |
| Rented | 21956 | |
| Homeowner free and clear | 16604 | |
| Occupied with no rent | 1120 | 1.5% |
| (Missing) | 1686 | 2.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| homeowner | 47696 | |
| with | 32212 | |
| mortgage/loan | 31092 | |
| rented | 21956 | |
| free | 16604 | 8.9% |
| and | 16604 | 8.9% |
| clear | 16604 | 8.9% |
| occupied | 1120 | 0.6% |
| no | 1120 | 0.6% |
| rent | 1120 | 0.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 222448 | |
| o | 158696 | |
| n | 119588 | |
| 115356 | 8.1% | |
| r | 113116 | 7.9% |
| a | 95392 | 6.7% |
| t | 86380 | 6.1% |
| w | 79908 | 5.6% |
| m | 78788 | 5.5% |
| g | 62184 | 4.4% |
| Other values (12) | 292472 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1424328 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 222448 | |
| o | 158696 | |
| n | 119588 | |
| 115356 | 8.1% | |
| r | 113116 | 7.9% |
| a | 95392 | 6.7% |
| t | 86380 | 6.1% |
| w | 79908 | 5.6% |
| m | 78788 | 5.5% |
| g | 62184 | 4.4% |
| Other values (12) | 292472 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1424328 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 222448 | |
| o | 158696 | |
| n | 119588 | |
| 115356 | 8.1% | |
| r | 113116 | 7.9% |
| a | 95392 | 6.7% |
| t | 86380 | 6.1% |
| w | 79908 | 5.6% |
| m | 78788 | 5.5% |
| g | 62184 | 4.4% |
| Other values (12) | 292472 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1424328 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 222448 | |
| o | 158696 | |
| n | 119588 | |
| 115356 | 8.1% | |
| r | 113116 | 7.9% |
| a | 95392 | 6.7% |
| t | 86380 | 6.1% |
| w | 79908 | 5.6% |
| m | 78788 | 5.5% |
| g | 62184 | 4.4% |
| Other values (12) | 292472 |
num_vehicles
Real number (ℝ)
Missing  Zeros 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1686 |
| Missing (%) | 2.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.0668202 |
| Minimum | 0 |
|---|---|
| Maximum | 6 |
| Zeros | 4636 |
| Zeros (%) | 6.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 4 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.170076 |
|---|---|
| Coefficient of variation (CV) | 0.56612374 |
| Kurtosis | 0.80250553 |
| Mean | 2.0668202 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.67181187 |
| Sum | 146273 |
| Variance | 1.3690778 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 28052 | |
| 1 | 17445 | |
| 3 | 13094 | |
| 4 | 5100 | 7.0% |
| 0 | 4636 | 6.4% |
| 5 | 1628 | 2.2% |
| 6 | 817 | 1.1% |
| (Missing) | 1686 | 2.3% |
| Value | Count | Frequency (%) |
| 0 | 4636 | 6.4% |
| 1 | 17445 | |
| 2 | 28052 | |
| 3 | 13094 | |
| 4 | 5100 | 7.0% |
| 5 | 1628 | 2.2% |
| 6 | 817 | 1.1% |
| Value | Count | Frequency (%) |
| 6 | 817 | 1.1% |
| 5 | 1628 | 2.2% |
| 4 | 5100 | 7.0% |
| 3 | 13094 | |
| 2 | 28052 | |
| 1 | 17445 | |
| 0 | 4636 | 6.4% |
age
Real number (ℝ)
| Distinct | 81 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.208893 |
| Minimum | 0 |
|---|---|
| Maximum | 120 |
| Zeros | 77 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 23 |
| Q1 | 34 |
| median | 48 |
| Q3 | 62 |
| 95-th percentile | 80 |
| Maximum | 120 |
| Range | 120 |
| Interquartile range (IQR) | 28 |
Descriptive statistics
| Standard deviation | 18.090035 |
|---|---|
| Coefficient of variation (CV) | 0.36761718 |
| Kurtosis | -0.39362047 |
| Mean | 49.208893 |
| Median Absolute Deviation (MAD) | 14 |
| Skewness | 0.37562027 |
| Sum | 3565578 |
| Variance | 327.24935 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 26 | 1462 | 2.0% |
| 45 | 1433 | 2.0% |
| 30 | 1420 | 2.0% |
| 54 | 1414 | 2.0% |
| 25 | 1404 | 1.9% |
| 27 | 1394 | 1.9% |
| 53 | 1381 | 1.9% |
| 56 | 1379 | 1.9% |
| 46 | 1368 | 1.9% |
| 21 | 1350 | 1.9% |
| Other values (71) | 58453 |
| Value | Count | Frequency (%) |
| 0 | 77 | 0.1% |
| 21 | 1350 | |
| 22 | 1299 | |
| 23 | 1324 | |
| 24 | 1293 | |
| 25 | 1404 | |
| 26 | 1462 | |
| 27 | 1394 | |
| 28 | 1333 | |
| 29 | 1265 |
| Value | Count | Frequency (%) |
| 120 | 66 | 0.1% |
| 114 | 60 | 0.1% |
| 110 | 65 | 0.1% |
| 100 | 71 | 0.1% |
| 96 | 6 | < 0.1% |
| 95 | 90 | 0.1% |
| 94 | 275 | |
| 93 | 123 | |
| 92 | 83 | 0.1% |
| 91 | 57 | 0.1% |
state_of_res
Text
| Distinct | 51 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 566.2 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 13 |
| Mean length | 8.4383781 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Alabama |
|---|---|
| 2nd row | Alabama |
| 3rd row | Alabama |
| 4th row | Alabama |
| 5th row | Alabama |
| Value | Count | Frequency (%) |
| california | 8870 | 10.5% |
| new | 7114 | 8.4% |
| texas | 5938 | 7.0% |
| florida | 4921 | 5.8% |
| york | 4375 | 5.2% |
| carolina | 3496 | 4.1% |
| pennsylvania | 2968 | 3.5% |
| illinois | 2896 | 3.4% |
| ohio | 2587 | 3.1% |
| north | 2498 | 3.0% |
| Other values (45) | 38730 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 79795 | |
| i | 70388 | 11.5% |
| n | 52941 | 8.7% |
| o | 50637 | 8.3% |
| s | 39554 | 6.5% |
| r | 39013 | 6.4% |
| e | 37030 | 6.1% |
| l | 31315 | 5.1% |
| t | 14701 | 2.4% |
| C | 14635 | 2.4% |
| Other values (36) | 181419 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 611428 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 79795 | |
| i | 70388 | 11.5% |
| n | 52941 | 8.7% |
| o | 50637 | 8.3% |
| s | 39554 | 6.5% |
| r | 39013 | 6.4% |
| e | 37030 | 6.1% |
| l | 31315 | 5.1% |
| t | 14701 | 2.4% |
| C | 14635 | 2.4% |
| Other values (36) | 181419 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 611428 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 79795 | |
| i | 70388 | 11.5% |
| n | 52941 | 8.7% |
| o | 50637 | 8.3% |
| s | 39554 | 6.5% |
| r | 39013 | 6.4% |
| e | 37030 | 6.1% |
| l | 31315 | 5.1% |
| t | 14701 | 2.4% |
| C | 14635 | 2.4% |
| Other values (36) | 181419 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 611428 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 79795 | |
| i | 70388 | 11.5% |
| n | 52941 | 8.7% |
| o | 50637 | 8.3% |
| s | 39554 | 6.5% |
| r | 39013 | 6.4% |
| e | 37030 | 6.1% |
| l | 31315 | 5.1% |
| t | 14701 | 2.4% |
| C | 14635 | 2.4% |
| Other values (36) | 181419 |
code_column
Real number (ℝ)
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3285.5236 |
| Minimum | 131 |
|---|---|
| Maximum | 8962 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 131 |
|---|---|
| 5-th percentile | 407 |
| Q1 | 1305 |
| median | 2269 |
| Q3 | 4979 |
| 95-th percentile | 8962 |
| Maximum | 8962 |
| Range | 8831 |
| Interquartile range (IQR) | 3674 |
Descriptive statistics
| Standard deviation | 2661.7752 |
|---|---|
| Coefficient of variation (CV) | 0.81015253 |
| Kurtosis | -0.11789355 |
| Mean | 3285.5236 |
| Median Absolute Deviation (MAD) | 1222 |
| Skewness | 1.0290568 |
| Sum | 2.3806247 × 108 |
| Variance | 7085047.4 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8962 | 8870 | 12.2% |
| 6026 | 5938 | 8.2% |
| 4979 | 4921 | 6.8% |
| 4431 | 4375 | 6.0% |
| 2997 | 2968 | 4.1% |
| 2925 | 2896 | 4.0% |
| 2614 | 2587 | 3.6% |
| 2357 | 2329 | 3.2% |
| 2269 | 2246 | 3.1% |
| 2198 | 2177 | 3.0% |
| Other values (39) | 33151 |
| Value | Count | Frequency (%) |
| 131 | 130 | 0.2% |
| 146 | 146 | |
| 162 | 160 | |
| 170 | 337 | |
| 188 | 186 | |
| 204 | 198 | |
| 218 | 216 | |
| 220 | 218 | |
| 307 | 305 | |
| 325 | 319 |
| Value | Count | Frequency (%) |
| 8962 | 8870 | |
| 6026 | 5938 | |
| 4979 | 4921 | |
| 4431 | 4375 | |
| 2997 | 2968 | 4.1% |
| 2925 | 2896 | 4.0% |
| 2614 | 2587 | 3.6% |
| 2357 | 2329 | 3.2% |
| 2269 | 2246 | 3.1% |
| 2198 | 2177 | 3.0% |
gas_usage
Real number (ℝ)
Missing 
| Distinct | 57 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 1686 |
| Missing (%) | 2.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 41.230501 |
| Minimum | 1 |
|---|---|
| Maximum | 570 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 10 |
| Q3 | 60 |
| 95-th percentile | 160 |
| Maximum | 570 |
| Range | 569 |
| Interquartile range (IQR) | 57 |
Descriptive statistics
| Standard deviation | 63.149323 |
|---|---|
| Coefficient of variation (CV) | 1.5316167 |
| Kurtosis | 13.033371 |
| Mean | 41.230501 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | 3.0309741 |
| Sum | 2917965 |
| Variance | 3987.837 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 24689 | |
| 2 | 6534 | 9.0% |
| 30 | 5100 | 7.0% |
| 40 | 4199 | 5.8% |
| 20 | 4118 | 5.7% |
| 50 | 4069 | 5.6% |
| 100 | 2623 | 3.6% |
| 60 | 2563 | 3.5% |
| 1 | 2368 | 3.3% |
| 80 | 2308 | 3.2% |
| Other values (47) | 12201 |
| Value | Count | Frequency (%) |
| 1 | 2368 | 3.3% |
| 2 | 6534 | 9.0% |
| 3 | 24689 | |
| 4 | 448 | 0.6% |
| 10 | 1361 | 1.9% |
| 20 | 4118 | 5.7% |
| 30 | 5100 | 7.0% |
| 40 | 4199 | 5.8% |
| 50 | 4069 | 5.6% |
| 60 | 2563 | 3.5% |
| Value | Count | Frequency (%) |
| 570 | 11 | < 0.1% |
| 540 | 9 | < 0.1% |
| 520 | 3 | < 0.1% |
| 510 | 3 | < 0.1% |
| 490 | 35 | |
| 480 | 72 | |
| 470 | 39 | |
| 460 | 42 | |
| 450 | 48 | |
| 440 | 3 | < 0.1% |
rooms
Real number (ℝ)
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.4945486 |
| Minimum | 1 |
|---|---|
| Maximum | 6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 566.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 5 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.7065374 |
|---|---|
| Coefficient of variation (CV) | 0.48834274 |
| Kurtosis | -1.2689358 |
| Mean | 3.4945486 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.0064184753 |
| Sum | 253208 |
| Variance | 2.91227 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 12230 | |
| 3 | 12134 | |
| 5 | 12098 | |
| 1 | 12042 | |
| 6 | 11999 | |
| 4 | 11955 |
| Value | Count | Frequency (%) |
| 1 | 12042 | |
| 2 | 12230 | |
| 3 | 12134 | |
| 4 | 11955 | |
| 5 | 12098 | |
| 6 | 11999 |
| Value | Count | Frequency (%) |
| 6 | 11999 | |
| 5 | 12098 | |
| 4 | 11955 | |
| 3 | 12134 | |
| 2 | 12230 | |
| 1 | 12042 |
recent_move_b
Boolean
Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1687 |
| Missing (%) | 2.3% |
| Memory size | 141.6 KiB |
| False | |
|---|---|
| True | |
| (Missing) | 1687 |
| Value | Count | Frequency (%) |
| False | 61773 | |
| True | 8998 | 12.4% |
| (Missing) | 1687 | 2.3% |
Interactions
Correlations
| Unnamed: 0 | age | code_column | gas_usage | health_ins | housing_type | income | is_employed | marital_status | num_vehicles | recent_move_b | rooms | sex | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Unnamed: 0 | 1.000 | -0.002 | -0.161 | 0.026 | 0.096 | 0.056 | 0.013 | 0.018 | 0.026 | -0.014 | 0.026 | -0.001 | 0.000 |
| age | -0.002 | 1.000 | -0.020 | 0.056 | 0.178 | 0.226 | 0.067 | 0.073 | 0.401 | -0.115 | 0.231 | -0.002 | 0.058 |
| code_column | -0.161 | -0.020 | 1.000 | -0.063 | 0.100 | 0.070 | -0.014 | 0.021 | 0.033 | -0.018 | 0.040 | -0.006 | 0.000 |
| gas_usage | 0.026 | 0.056 | -0.063 | 1.000 | 0.042 | 0.095 | 0.041 | 0.000 | 0.037 | 0.139 | 0.067 | 0.003 | 0.009 |
| health_ins | 0.096 | 0.178 | 0.100 | 0.042 | 1.000 | 0.155 | 0.066 | 0.111 | 0.147 | 0.057 | 0.059 | 0.001 | 0.060 |
| housing_type | 0.056 | 0.226 | 0.070 | 0.095 | 0.155 | 1.000 | 0.066 | 0.065 | 0.179 | 0.220 | 0.312 | 0.000 | 0.013 |
| income | 0.013 | 0.067 | -0.014 | 0.041 | 0.066 | 0.066 | 1.000 | 0.057 | 0.070 | 0.105 | 0.019 | 0.000 | 0.128 |
| is_employed | 0.018 | 0.073 | 0.021 | 0.000 | 0.111 | 0.065 | 0.057 | 1.000 | 0.096 | 0.086 | 0.023 | 0.000 | 0.000 |
| marital_status | 0.026 | 0.401 | 0.033 | 0.037 | 0.147 | 0.179 | 0.070 | 0.096 | 1.000 | 0.211 | 0.125 | 0.003 | 0.160 |
| num_vehicles | -0.014 | -0.115 | -0.018 | 0.139 | 0.057 | 0.220 | 0.105 | 0.086 | 0.211 | 1.000 | 0.123 | -0.001 | 0.072 |
| recent_move_b | 0.026 | 0.231 | 0.040 | 0.067 | 0.059 | 0.312 | 0.019 | 0.023 | 0.125 | 0.123 | 1.000 | 0.000 | 0.000 |
| rooms | -0.001 | -0.002 | -0.006 | 0.003 | 0.001 | 0.000 | 0.000 | 0.000 | 0.003 | -0.001 | 0.000 | 1.000 | 0.000 |
| sex | 0.000 | 0.058 | 0.000 | 0.009 | 0.060 | 0.013 | 0.128 | 0.000 | 0.160 | 0.072 | 0.000 | 0.000 | 1.000 |
Missing values
Sample
| Unnamed: 0 | custid | sex | is_employed | income | marital_status | health_ins | housing_type | num_vehicles | age | state_of_res | code_column | gas_usage | rooms | recent_move_b | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 7 | 000006646_03 | Male | True | 22000.0 | Never married | True | Homeowner free and clear | 0.0 | 24 | Alabama | 1047 | 210.0 | 3 | F |
| 1 | 8 | 000007827_01 | Female | NaN | 23200.0 | Divorced/Separated | True | Rented | 0.0 | 82 | Alabama | 1047 | 3.0 | 6 | T |
| 2 | 9 | 000008359_04 | Female | True | 21000.0 | Never married | True | Homeowner with mortgage/loan | 2.0 | 31 | Alabama | 1047 | 40.0 | 3 | F |
| 3 | 10 | 000008529_01 | Female | NaN | 37770.0 | Widowed | True | Homeowner free and clear | 1.0 | 93 | Alabama | 1047 | 120.0 | 2 | F |
| 4 | 11 | 000008744_02 | Male | True | 39000.0 | Divorced/Separated | True | Rented | 2.0 | 67 | Alabama | 1047 | 3.0 | 2 | F |
| 5 | 15 | 000011466_01 | Male | NaN | 11100.0 | Married | True | Homeowner free and clear | 2.0 | 76 | Alabama | 1047 | 200.0 | 6 | F |
| 6 | 17 | 000015018_01 | Female | True | 25800.0 | Married | False | Rented | 2.0 | 26 | Alabama | 1047 | 3.0 | 3 | F |
| 7 | 19 | 000017314_02 | Female | NaN | 34600.0 | Married | True | Homeowner free and clear | 2.0 | 73 | Alabama | 1047 | 50.0 | 5 | F |
| 8 | 20 | 000017383_04 | Female | True | 25000.0 | Never married | True | Homeowner free and clear | 5.0 | 27 | Alabama | 1047 | 3.0 | 4 | F |
| 9 | 21 | 000017554_02 | Male | True | 31200.0 | Married | True | Homeowner with mortgage/loan | 3.0 | 54 | Alabama | 1047 | 20.0 | 6 | F |
| Unnamed: 0 | custid | sex | is_employed | income | marital_status | health_ins | housing_type | num_vehicles | age | state_of_res | code_column | gas_usage | rooms | recent_move_b | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 72448 | 99990 | 001448933_01 | Male | True | 85000.0 | Married | False | Homeowner with mortgage/loan | 2.0 | 30 | Wyoming | 131 | 30.0 | 4 | F |
| 72449 | 99991 | 001458068_02 | Female | True | 13000.0 | Married | True | Homeowner with mortgage/loan | 3.0 | 47 | Wyoming | 131 | 50.0 | 2 | F |
| 72450 | 99993 | 001493692_02 | Female | True | 7200.0 | Never married | True | Homeowner with mortgage/loan | 3.0 | 33 | Wyoming | 131 | 30.0 | 4 | F |
| 72451 | 99994 | 001494186_02 | Female | True | 44000.0 | Married | True | Homeowner with mortgage/loan | 2.0 | 46 | Wyoming | 131 | 90.0 | 3 | F |
| 72452 | 99995 | 001501555_01 | Female | True | 85000.0 | Married | True | Homeowner with mortgage/loan | 2.0 | 32 | Wyoming | 131 | 70.0 | 5 | F |
| 72453 | 99996 | 001506841_02 | Female | True | 18500.0 | Never married | False | Rented | 1.0 | 25 | Wyoming | 131 | 10.0 | 4 | F |
| 72454 | 99997 | 001507219_01 | Female | NaN | 20800.0 | Widowed | True | Homeowner free and clear | 1.0 | 86 | Wyoming | 131 | 120.0 | 6 | F |
| 72455 | 99998 | 001513103_01 | Male | True | 75000.0 | Married | True | Homeowner with mortgage/loan | 2.0 | 50 | Wyoming | 131 | 90.0 | 3 | F |
| 72456 | 99999 | 001519624_01 | Female | True | 22200.0 | Divorced/Separated | False | Homeowner free and clear | 1.0 | 61 | Wyoming | 131 | 50.0 | 6 | F |
| 72457 | 100000 | 001520877_01 | Male | True | 16400.0 | Never married | True | NaN | NaN | 31 | Wyoming | 131 | NaN | 5 | NaN |